NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Diversify, Don’t Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

Yu, Zhuoran; Zhu, Chenchen; Culatana, Sean; Krishnamoorthi, Raghuraman; Xiao, Fanyi; Lee, Yong Jae (January 2025, Transactions on Machine Learning Research)

Full Text Available
Delving Deeper into Anti-Aliasing in ConvNets

https://doi.org/10.1007/s11263-022-01672-y

Zou, Xueyan; Xiao, Fanyi; Yu, Zhiding; Li, Yuheng; Lee, Yong Jae (January 2022, International Journal of Computer Vision)

Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling. It arises as a problem in the context of deep learning as downsampling layers are widely adopted in deep architectures to reduce parameters and computation. The standard solution is to apply a lowpass filter (e.g., Gaussian blur) before downsampling. However, it can be suboptimal to apply the same filter across the entire content, as the frequency of feature maps can vary across both spatial locations and feature channels. To tackle this, we propose an adaptive content-aware low-pass filtering layer, which predicts separate filter weights for each spatial location and channel group of the input feature maps. We investigate the effectiveness and generalization of the proposed method across multiple tasks, including image classification, semantic segmentation, instance segmentation, video instance segmentation, and image-to-image translation. Both qualitative and quantitative results demonstrate that our approach effectively adapts to the different feature frequencies to avoid aliasing while preserving useful information for recognition. Code is available at https://maureenzou.github.io/ddac/
more » « less
Full Text Available
YolactEdge: Real-time Instance Segmentation on the Edge

Liu, Haotian; Rivera-Soto, Rafael; Xiao, Fanyi; Lee, Yong Jae (January 2021, IEEE International Conference on Robotics and Automation (ICRA))

Full Text Available
YOLACT++: Better Real-time Instance Segmentation

https://doi.org/10.1109/TPAMI.2020.3014297

Bolya, Daniel; Zhou, Chong; Xiao, Fanyi; Lee, Yong Jae (January 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence)
null (Ed.)
Full Text Available
Delving Deeper into Anti-aliasing in ConvNets

Zou, Xueyan; Xiao, Fanyi; Yu, Zhiding; Lee, Yong Jae (January 2020, BMVC)
null (Ed.)
Full Text Available
Video Object Detection with an Aligned Spatial-Temporal Memory

Xiao, Fanyi; Lee, Yong Jae (January 2018, Proceedings of the European Conference on Computer Vision (ECCV))

We introduce Spatial-Temporal Memory Networks for video object detection. At its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM's design enables full integration of pretrained backbone CNN weights, which we find to be critical for accurate detection. Furthermore, in order to tackle object motion in videos, we propose a novel MatchTrans module to align the spatial-temporal memory from frame to frame. Our method produces state-of-the-art results on the benchmark ImageNet VID dataset, and our ablative studies clearly demonstrate the contribution of our different design choices.
more » « less
Full Text Available

Search for: All records